Display driver ic, display module and electrical device incorporating a graphics engine

ABSTRACT

The invention provides a display driver integrated circuit, for connection to a small-area display, the integrated circuit including a hardware-implemented graphics engine for receiving vector graphics commands and rendering image data for display pixels in dependence upon the received commands, and also including display driver circuitry for driving the connected display in accordance with the image data rendered by the graphics engine. In another aspect the graphics engine is held within the display module, but not embedded in the display driver IC. The invention provides graphics acceleration that increases display performance, but does not significantly increase cost of manufacture. Power consumption in comparison to non-accelerated CPU graphics processing is lowered.

The present invention relates to a display driver IC, display module andelectrical device incorporating a graphics engine.

The invention finds application notably in small-area displays found onportable or console electrical devices. Numerous such devices exist,such as PDAs, cordless, mobile and desk telephones, in-car informationconsoles, hand-held electronic games sets, multifunction watches etc.

In the prior art, there is typically a main CPU, which has the task ofreceiving display commands, processing them and sending the results tothe display module in a pixel-data form describing the properties ofeach display pixel. The amount of data sent to the display module isproportional to the display resolution and the colour depth. Forexample, a small monochrome display of 96×96 pixels with a four levelgrey scale requires a fairly small amount of data to be transferred tothe display module. Such a screen does not, however, meet user demandfor increasingly attractive and informative displays.

With the demand for colour displays and for sophisticated graphicsrequiring higher screen resolution, the amount of data to be processedby the CPU and sent to the display module has become much greater. Morecomplex graphics processing places a heavy strain on the CPU and slowsthe device, so that display reaction and refresh rate may becomeunacceptable. This is especially problematic for games applications.Another problem is the power drain caused by increased graphicsprocessing, which can substantially shorten the intervals betweenrecharging of battery-powered devices.

In the rather different technical area of personal computers andcomputer networks, the problem of displaying sophisticated graphics atan acceptable speed is often solved by a hardware graphics engine (alsoknown as a graphics accelerator) on an extra card that is housed in theprocessor box or as an embedded unit on the motherboard. The graphicsengine takes over at least some of the display command processing fromthe main CPU. Graphics engines are specially developed for graphicsprocessing, so that they are faster and uses less power than the CPU forthe same graphics tasks. The resultant video data is then sent from theprocessor box to a separate “dumb” display module.

Known graphics engines used in PCs are specially conceived forlarge-area displays and are thus highly complex systems requiringseparate silicon dies for the high number of gates used. It isimpractical to incorporate these engines into portable devices, whichhave small-area displays and in which size and weight are strictlylimited, and which have limited power resources.

Moreover, PC graphics engines are designed to process the types of dataused in large-area displays, such as multiple bitmaps of complex images.Data sent to mobile and small-area displays may today be in vectorgraphics form. Examples of vector graphics languages areMacroMediaFlash™ and SVG™. Vector graphics definitions are also used formany gaming Application Programming Interfaces (APIs), for exampleMicrosoft DirectX and Silicon Graphics OpenGL.

In vector graphics images are defined as multiple complex polygons. Thismakes vector graphics suited to images that can be easily defined bymathematical functions, such as game screens, text and GPS navigationmaps. For such images, vector graphics is considerably more efficientthan an equivalent bitmap. That is, a vector graphics file defining thesame detail (in terms of complex polygons) as a bitmap file (in terms ofeach individual display pixel) will contain fewer bytes. The bitmap fileis the finished image data in pixel format, which can be copied directlyto the display.

A complex polygon is a polygon that can self-intersect and have “holes”in it. Examples of complex polygons are letters and numerals such as “X”and “8” and kanji characters. Vector graphics is, of course, alsosuitable for definition of the simple polygons such as the trianglesthat make up the basic primitive for many computer games. The polygon isdefined by straight or curved edges and fill commands. In theory thereis no limit to the number of edges of each polygon. However, a vectorgraphics file containing, for instance, a photograph of a complex scenewill contain several times more bytes than the equivalent bitmap.

Software graphics processing algorithms are also known, some suitablefor use with the high-level/vector graphics languages employed withsmall-area displays. Some algorithms are available, for example, in“Computer Graphics: Principles and Practice” Foley, Van Damn, Feiner,Hughes 1996 Edition, ISBN 0-201-84840-6.

Known software graphics algorithms use internal dynamic data structureswith linked lists and sort operations. All the vector graphics commandsgiving polygon edge data must be read into the software engine andstored before it starts rendering (generating an image for display fromthe high-level commands received). The commands for each polygon arestored in a master list of start and end points for each polygon edge.The polygon is drawn scanline by scanline. For each scanline of thedisplay the software selects which polygon edges cross the scanline andthen identifies where each selected edge crosses the scanline. Once thecrossing points have been identified, the polygon can be filled betweenthem. The size of the master list that can be processed is limited bythe amount of memory available in the software. The known softwarealgorithms thus suffer from the disadvantage that they require a largeamount of memory to store all the commands for complex polygons beforerendering. This may prejudice manufacturers against incorporating vectorgraphics processing in mobile devices.

It is desirable to overcome the disadvantages inherent in the prior artand lessen the CPU load and data traffic for display purposes inportable electrical devices.

The invention is defined in the independent claims, to which referenceshould now be made. Advantageous features are defined in the dependentclaims.

According to one embodiment of the invention there is provided a displaydriver IC, for connection to a small-area display, the IC including ahardware-implemented graphics engine for receiving vector graphicscommands and rendering image data for display pixels in dependence uponthe received commands, and also including display driver circuitry fordriving the connected display in accordance with the image data renderedby the graphics engine.

According to another embodiment of the invention there is provided adisplay module for incorporation in a portable electrical device andincluding:

a display;

a hardware-implemented graphics engine for receiving vector graphicscommands and rendering image data for display pixels in dependence uponthe received commands; and

display driver circuitry connected to the graphics engine and to thedisplay for driving the display in accordance with the image datarendered by the graphics engine.

Although the personal computer (PC) solution is widely used forapplications having a “dumb” display module, a separate processor boxand a fixed power supply, it could not be used to overcome the graphicsprocessing difficulties for portable devices in which traffic betweenthe CPU and display has a substantial effect on power consumption. Thisis because the data sent to the dumb display from the processor area isnot affected by the introduction of a PC graphics engine. RGB signalsare sent from the processor box to the display as before. Thus high datatraffic to the display and the resultant power consumption areunchanged.

For the first time, the inventors have realised that a graphics engineneed not be provided in the CPU part of a device, but may be held in thedisplay module. They have been able to design a hardware graphics enginethat is sufficiently simple that it can be embedded in a display driverIC for a small-area display or in a display module for a portableelectrical device. Since the graphics engine is in the display module,high-level graphics commands travel between the CPU and the display partof the mobile device, rather than pixel data. Use of graphics engines asopposed to non-accelerated CPU processing reduces power consumption. Useof the graphics engine in the display module allows considerable savingsin power in a device of almost identical size and weight.

Thus, embodiments of the invention allow a portable electrical device tobe provided with a display that is capable of displaying images fromvector graphics commands whilst maintaining fast display refresh andresponse times and long battery life.

Reference herein to small-area displays includes displays of a sizeintended for use in portable electrical devices and excludes, forexample, displays used for PCS.

Reference herein to portable devices includes hand-held, worn, pocketand console devices etc that are sufficiently small and light to becarried by the user.

Preferably, the graphics engine includes control circuitry/logic to readin one vector graphics command at a time, convert the command to spatialimage information and then discard the original command before the nextcommand is similarly processed. For example the engine may read in oneedge-drawing command for one polygon edge of an image to be displayed ata time, or one fill command to colour a polygon that has already beenread into the engine.

In preferred embodiments, the graphics engine includes edge drawinglogic/circuitry linked to an edge buffer (of finite resolution) to storespatial information for (the edges of) any polygon read into the engine.This logic and edge buffer arrangement not only makes it possible todiscard the original data for each edge once it has been read into thebuffer, in contrast to the previous software engine. It also has theadvantage that it imposes no limit on the complexity of the polygon tobe drawn, as may be the case with the prior art linked list storage ofthe high-level commands.

The edge buffer may be of higher resolution than the front buffer of thedisplay memory. For example, the edge buffer may be arranged to storesub-pixels, a plurality of sub-pixels corresponding to a single displaypixel. The sub-pixels preferably switch between the set and unset statesto store the spatial information. The provision of sub-pixels (more thanone for each corresponding pixel of the display) facilitatesmanipulation of the data and anti-aliasing in an expanded spatial form,before consolidation into the display size. The number of sub-pixels percorresponding display pixel determines the degree of anti-aliasingavailable. Use of unset and set states only mean that the edge bufferrequires one bit of memory per sub-pixel.

Preferably, the edge buffer stores each polygon edge as boundarysub-pixels which are set and whose positions in the edge buffer relateto the edge position in the final image. More preferably, the edgedrawing logic includes a clipper unit to prevent processing of anypolygon edge or polygon edge portion that falls outside the displayarea.

The graphics engine may include filler circuitry/logic to fill inpolygons whose edges have been stored in the edge buffer. This two-passmethod has the advantage of simplicity in that the edge buffer format isre-used before the steps to give the color of the filled polygon. Theresultant set sub-pixels need not be restored in the edge buffer but canbe used directly in the next steps of the process.

The graphics engine preferably includes a back buffer to store part orall of an image before transfer to a front buffer of the display drivermemory. Use of a back buffer avoids rendering directly to the frontbuffer and can prevent flicker in the display image.

The back buffer is preferably of the same resolution as the front bufferof the display memory. That is, each pixel in the back buffer is mappedto a corresponding pixel of the front buffer. The back buffer preferablyhas the same number of bits per pixel as the front buffer to representthe colour and depth (RGBA values) of the pixel.

There may be combination logic/circuitry provided to sequentiallycombine each filled polygon produced by the filler circuitry into theback buffer. In this way the image is built up polygon by polygon in theback buffer before transfer to the front buffer for display.

Advantageously, the colour of each pixel stored in the back buffer isdetermined in dependence on the colour of the pixel in the polygon beingprocessed, the percentage of the pixel covered by the polygon and thecolour already present in the corresponding pixel in the back buffer.This colour-blending step is suitable for anti-aliasing.

In one preferred implementation, the edge buffer stores sub-pixels inthe form of a grid having a square number of sub-pixels for each displaypixel. For example, a grid of 4×4 sub-pixels in the edge buffer maycorrespond to one display pixel. Each sub-pixel is set or unsetdepending on the edges to be drawn.

In an alternative embodiment, every other sub-pixel in the edge bufferis not utilised, so that half the square number of sub-pixels isprovided per display pixel. In this embodiment, if the edge-drawingcircuitry requires that a non-utilised sub-pixel be set, theneighbouring (utilised) sub-pixel is set in its place. This alternativeembodiment has the advantage of requiring fewer bits in the edge bufferper display pixel, but lowers the quality of antialiasing somewhat.

The slope of each polygon edge may be calculated from the edge endpoints and then sub-pixels of the grid set along the line. Preferably,the following rules are used for setting sub-pixels:

one sub-pixel only per horizontal line of the sub-pixel grid is set foreach polygon edge; the sub-pixels are set from top to bottom (in the Ydirection);

the last sub-pixel of the line is not set; any sub-pixels set under theline are inverted.

In this implementation, the filler circuitry may include logic/codeacting as a virtual pen (sub-pixel state-setting filler) traversing thesub-pixel grid, which pen is initially off and toggles between the offand on states each time it encounters a set sub-pixel. The resultantdata is preferably fed to amalgamation circuitry combining thesub-pixels corresponding to each pixel.

The virtual pen preferably sets all sub-pixels inside the boundarysub-pixels, and includes boundary pixels for right-hand boundaries, andclears boundary pixels for left-hand boundaries or vice versa. Thisavoids overlapping sub-pixels for polygons that do not mathematicallyoverlap.

Preferably, the virtual pen's traverse is limited so that it does notneed to consider sub-pixels outside the polygon edge. For example, abounding box enclosing the polygon may be provided.

The sub-pixels (from the filler circuitry) corresponding to a singledisplay pixel are preferably amalgamated into a single pixel beforecombination to the back buffer. Amalgamation allows the back buffer tobe of smaller size than the edge buffer, thus reducing memoryrequirement.

Combination circuitry may be provided for combination to the backbuffer, the number of sub-pixels of each amalgamated pixel covered bythe filled polygon determining a blending factor for combination of theamalgamated pixel into the back buffer.

The back buffer is copied to the front buffer of the display memory oncethe image on the part of the display for which it holds information hasbeen entirely rendered. In fact, the back buffer may be of the same sizeas the front buffer and hold information for the whole display.Alternatively, the back buffer may be smaller than the front buffer andstore the information for part of the display only, the image in thefront buffer being built from the back buffer in a series of externalpasses.

In this latter alternative, the process is shortened if only commandsrelevant to the part of the image to be held in the back buffer are sentto the graphics engine in each external pass (to the CPU).

The graphics engine may be provided with various extra features toenhance its performance.

The graphics engine may further include a curve tessellator to divideany curved polygon edges into straight-line segments and store theresultant segments in the edge buffer.

The graphics engine may be adapted so that the back buffer holds one ormore graphics (predetermined image elements) which are transferred tothe front buffer at one or more locations determined by the high levellanguage. The graphics may be still or moving images (sprites), or eventext letters.

The graphics engine may be provided with a hairline mode, whereinhairlines are stored in the edge buffer by setting sub-pixels in abitmap and storing the bitmap in multiple locations in the edge bufferto form a line. Such hairlines define lines-of one pixel depth and areoften used for drawing polygon silhouettes.

When implemented in hardware, the graphics engine may be less than 100Kgates in size and preferably less than 50K.

Any display suitable for use with vector graphics can be enhanced withthe graphics engine of the present invention. In preferred embodimentsthe display is an LCD or LED based display and the driver circuitry issource driver circuitry.

The display driver circuitry is preferably driver circuitry for onedirection of the display only (that is for rows or for columns). It mayalso include control circuitry for control of the display. This isgenerally the case for the source driver of amorphous TFT LCD displays.

The display driver circuitry may also include driver control circuitryfor connection to a separate display driver for the other direction. Inamorphous TFT LCD displays, the source driver often controls the gatedriver.

One graphics engine may be provided per driver IC. However, where thegraphics engine is not provided on the driver IC it may service aplurality of ICs in the display module, such as a plurality of sourceICs used to drive a slightly larger display. The graphics engine in thiscase may be provided its own separate IC, or it may be embedded in amaster source driver that controls the remaining source drivers.

The display driver/module may further include display memory, decoderand display latch and timing, data interface logic, control logic andpower management logic.

The invention is also applicable to larger electrical devices having adisplay unit such as PCs and laptops, when vector graphics processing isrequired (perhaps in addition to other graphics processing).

The invention also relates to an electrical device including:

a processing unit; and

a display unit having a display

wherein the processing unit sends high-level (vector) graphics commandsto the display unit and a graphics engine as described herein isprovided in the display unit to render image data for display pixels inaccordance with the high-level commands.

The graphics engine need not be implemented in hardware, but mayalternatively be a software graphics engine. In this case the necessarycoded logic could be held in the CPU, along with sufficient code/memoryfor any of the preferred features detailed above, if they are required.Where circuitry is referred to above, the skilled person will readilyappreciate that the same function is available in a code section of asoftware implementation.

The graphics engine may be a program, preferably held in a processingunit, or may be a record on a carrier or take the form of a signal.

There are several specific advantages of the logical construction of thegraphics engine. One advantage is that it does not require memory tohold a polygon edge or fill command once it has been read into theengine. Considerable memory savings are achievable, making the graphicsengine particularly suitable for use with portable electrical devices,but also useful for larger electrical devices, which are not necessarilyportable.

Preferred features of the present invention will now be described,purely by way of example, with reference to the accompanying drawings,in which:

FIG. 1 is a block diagram representing function blocks of a preferredgraphics engine

FIG. 2 is a flow chart illustrating operation of a preferred graphicsengine;

FIG. 3 is a schematic of an edge buffer showing the edges of a polygonto be drawn and the drawing commands that result in the polygon;

FIG. 4 is a schematic of an edge buffer showing sub-pixels set for eachedge command;

FIG. 5 is a schematic of an edge buffer showing a filled polygon;

FIG. 6 is a schematic of the amalgamated pixel view of the filledpolygon shown in FIG. 5;

FIGS. 7 a and 7 b show a quadratic and a cubic bezier curverespectively;

FIG. 8 shows a curve tessellation process according to an embodiment ofthe invention;

FIG. 9 gives four examples of linear and radial gradients;

FIG. 10 shows a standard gradient square;

FIG. 11 shows a hairline to be drawn in the edge buffer;

FIG. 12 shows the-original circle shape to draw a hairline in the edgebuffer, and its shifted Position;

FIG. 13 shows the final content of the edge buffer when a hairline hasbeen drawn;

FIG. 14 shows a sequence demonstrating the contents of the edge, backand front buffers in which the back buffer holds ⅓ of the display imagein each pass;

FIG. 15 shows one sprite in the back buffer copied to two locations inthe front buffer;

FIG. 16 shows an example in which hundreds of small 2D sprites arerendered to simulate spray of small particles;

FIG. 17 shows a hardware implementation for the graphics engine,

FIG. 18 is a schematic representation of a graphics engine according toan embodiment of the invention integrated in a source IC for an LCD orequivalent type display;

FIG. 19 is a schematic representation of a graphics engine according toan embodiment of the invention integrated in a display module andserving two source ICs for an LCD or equivalent type display;

FIG. 20 is a schematic representation of a source driver ICincorporating a graphics engine and its links to CPU, the display areaand a gate driver IC;

FIG. 21 shows the functional blocks of an IC driver with an incorporatedgraphics engine;

FIG. 22 shows TFT type structure and addressing as well as a typicaltiming diagram for the gate driver IC;

FIG. 23 shows source driving for an LCD display, in which colourinformation from the front buffer is sent to the display;

FIG. 24 shows a single display pixel with the removal of odd XYlocations;

FIG. 25 shows data transfer and power usage between a CPU and displayvia a graphics engine for a busy screen example; and

FIG. 26 shows data transfer and power usage between a CPU and displayvia a graphics engine for a rotating triangle example.

FUNCTIONAL OVERVIEW

The function boxes in FIG. 1 illustrate the major logic gate blocks ofan exemplary graphics engine 1. The vector graphics command are fedthrough the input/output section 10 initially to a curve tessellator 11,which divides any curved edges into straight-line segments. Theinformation passes through to an edge and hairline draw logic block 12that stores results in an edge buffer 13, which, in this case has 16bits per display pixel. The edge buffer information is fed to thescanline filler 14 section to fill-in polygons as required by the fillcommands of the vector graphics language. The filled polygon informationis transferred to the back buffer 15 (in this case, again 16 bits perdisplay pixel), which, in its turn relays the image to an image transferblock 16 for transfer to the front buffer.

The flow chart shown in FIG. 2 outlines the full rendering process forfilled polygons. The polygon edge definition data comes into the engineone edge (in the form of one line or curve) at a time. The commandlanguage typically defines the image from back to front, so thatpolygons in the background of the image are defined (and thus read)before polygons in the foreground. If there is a curve it is tessellatedbefore the edge is stored in the edge buffer. Once the edge has beenstored, the command to draw the edge is discarded.

In vector graphics, all the edges of a polygon are defined by commandssuch as “move”, “line” and “curve” commands before the polygon isfilled, so that the tessellation and line drawing loop is repeated (inwhat is known as a first pass) until a fill command is read. The processthen moves onto filling the polygon colour in the edge buffer format.This is known as the second pass. The next step is compositing thepolygon colour with the colour already present in the same location inthe back buffer. The filled polygon is added to the back buffer onepixel at a time. Only the relevant pixels of the back buffer(those-covered by the polygon) are composited with the edge buffer.

Once one polygon is stored in the back buffer, the process then returnsto read in the next polygon as described above. The next polygon, whichis in front of the previous polygon, is composited into the back bufferin its turn. Once all the polygons have been drawn, the image istransferred from the back buffer to the front buffer, which may be, forexample, in the source driver IC of an LCD display.

The Edge Buffer

The edge buffer shown in FIG. 3 is of reduced size for explanatorypurposes, and is for 30 pixels (6×5) of the display. It has a sub-pixelgrid of 4×4 sub-pixels (16 bits) corresponding to each pixel of thedisplay. Only one bit is required per sub-pixel, which takes the valueunset (by default) or set.

The dotted line 20 represents the edges of the polygon to be drawn fromthe commands shown below.

-   -   Move To (12,0)    -   Line To (20, 19)    -   Line To (0, 7)    -   Line To (12,0)    -   Move To (11, 4)    -   Line To (13, 12)    -   Line To (6, 8)    -   Line To (11, 4)    -   Fill (black)

The command language refers to the sub-pixel coordinates, as iscustomary for accurate positioning of the corners. All of the commandsexcept the fill command are processed as part of the first pass. Thefill command initiates the second pass to fill and combine the polygonto the back buffer.

FIG. 4 shows sub-pixels set for each line command. Set sub-pixels 21 areshown for illustration purposes only along the dotted line. Due to thereduced size, they cannot accurately represent sub-pixels that would beset using the commands or rules and code shown below.

The edges are drawn into the edge buffer in the order defined in thecommand language. For each line, the slope is calculated from the endpoints and then sub-pixels are set along the line. A sub-pixel is setper clock cycle.

The following rules are used for setting sub-pixels: One sub-pixel onlyper horizontal line of the sub-pixel grid is set for each polygon edge.

The sub-pixels are set from top to bottom (in the Y direction).

Any sub-pixels set under the line are inverted.

The last sub-pixel of the line is not set.

The inversion rule is to handle self-intersection of complex polygonssuch as in the character “X”. Without the inversion rule, the exactintersection point might have just one set sub-pixel, which wouldconfuse the fill algorithm described later. Clearly, the necessity forthe inversion rule makes it important to avoid overlapping end points ofedges. Any such points would disappear, due to inversion.

To avoid such overlapping end points of consecutive lines on the samepolygon the lowest sub-pixel is not set.

For example, with the command list:

Moveto (0,0)

Lineto (0,100)

Lineto (0,200)

The first edge is effectively drawn from 0,00 to 0,99 and the secondline starts from 0,100 to 01,99. The result is a solid line. Since theline is drawn from top to bottom the last sub-pixel is also the lowestsub-pixel (unless the line is perfectly horizontal, as in this case).

The following code section implements an algorithm for setting boundarysub-pixels according to the above rules. The code before the “for(iy=y0+1; iy<y1; iy++)” loop is run once per edge and the code in the“for (iy=y0+1; iy<y1; iy++)” loop is run every clock cycle. voidedgedraw(int x0, int y0, int x1, int y1) {   float tmpx,tmpy;   floatstep,dx,dy;   int iy,ix;   int bit,idx;   // Remove non visible lines  if ((y0==y1))  return; // Horizontal line  if ((y0<0)&&(y1<0)) return;// Out top  if ((x0>(176*4))&&(x1>(176*4))) return; // Out right  if((y0>(220*4))&&(y1>(220*4))) return; // Out bottom   // Always draw fromtop to bottom (Y Sort)   if (y1<y0)   {     tmpx=x0;x0=x1;x1=tmpx;    tmpy=y0;y0=y1;y1=tmpy;   }   // Init line   dx=x1−x0;   dy=y1−y0;  if (dy==0) dy=1;   step=dx/dy;  // Calculate slope of the line  ix=x0;   iy=y0;   // Bit order in sbuf (16 sub-pixels per pixel)   //0123   // 4567   // 89ab   // cdef   // Index= YYYYYYYXXXXXXXyyxx   //four lsb of index used to index bits within the unsigned short   if(ix<0) ix=0;   if (ix>(176*4)) ix=176*4;   if (iy>0)   {    idx=((ix>>2)&511)|((iy>>2)<<9); // Integer part    bit=(ix&3)|(iy&3)<<2;     sbuf[idx&262143]{circumflex over( )}=(1<<bit);   }   for (iy=y0+1;iy<y1;iy++)   {     if (iy<0)continue;     if (iy>220*4) continue;     ix=x0+step*(iy−y0);     if(ix<0) ix=0;    if (ix>(176*4)) ix=176*4;   idx=((ix>>2)&511)|((iy>>2)<<9); // Integer part   bit=(ix&3)|(iy&3)<<2;    sbuf[idx&262143]{circumflex over( )}=(1<<bit);   } }

FIG. 5 shows the filled polygon in sub-pixel definition. The darksub-pixels are set. It should be noted here that the filling process iscarried out by filler circuitry and that there is no need to re-storethe result in the edge buffer. The figure is merely a representation ofthe set sub-pixels sent to the next step in the process. The polygon isfilled by a virtual marker or pen travelling across the sub-pixel grid,which pen is initially off and toggles between the off and on stateseach time it encounters a set sub-pixel. The pen moves from the left tothe right in this example, one sub-pixel at a time. If the pen is up andthe sub-pixel is set, then the pixel is left set and the pen sets thefollowing pixels until it reaches another set pixel. The second setpixel is cleared and the pen remains up and continues to the right.

This method includes the boundary sub-pixels on the left of the polygonbut leaves out sub-pixels on the right boundary. The reason for this isthat if two adjacent polygons share the same edge, there must beconsistency as to which polygon any given sub-pixel is assigned to, toavoid overlapped sub-pixels for polygons that do not mathematicallyoverlap.

Once the polygon in the edge buffer has been filled, the sub-pixelsbelonging to each pixel can be amalgamated and combined into the backbuffer. The coverage of each 4×4 mini-grid gives the depth of colour.For example, the third pixel from the left in the top row of pixels has12/16 set pixels. Its coverage is 75%.

Combination into the Back Buffer

FIG. 6 shows each pixel to be combined into the back buffer and its 4bit (0 . . . F hex) blending factor calculated from the sub-pixels setper pixel as shown in FIG. 5. One pixel is combined into the back bufferper clock cycle. A pixel is only combined if a value other than 0 isstored in the edge buffer.

The back buffer is not required to be the same size as the edge bufferand, can be smaller, for example corresponding to the display size or apart of the display.

The resolution of the polygon in the back buffer is one quarter of itssize in the edge buffer in this example. The benefit of the two-passmethod and amalgamation before storage of the polygon in the back bufferis that the total amount of memory required is significantly reduced.The edge buffer requires 1 bit per sub-pixel for the set and unsetvalues. However, the back buffer requires 16 bits per pixel to representthe shade to be displayed and, if the back buffer were used to setboundary sub-pixels and fill the resultant polygons, the amount ofmemory required would be eight times greater than the combination of theedge and back buffers, that is, sixteen 16 bit buffers would berequired, rather than two.

Edge Buffer Compression To 8 Bits

The edge buffer is described above as having a 16 bit value organized as4×4 bits. An alternative arrangement reduces the memory required by 50%by lowering the edge buffer data per pixel to 8bits.

This is accomplished by removing odd XY locations from the 4×4 layoutfor a single display pixel as shown in FIG. 24.

If a sub-pixel to be drawn to the edge buffer has coordinates thatbelong to a location without bit storage, it is moved one step to theright. For example, the top right sub-pixel in the partial grid shownabove is shifted to the partial grid for the next display pixel to theright. The following code line is added to the code shown above.

if ((LSB(X) xor LSB(Y))==1) Y=Y+1; // LSB( ) returns the lowest bit of acoordinate

This leaves only eight locations inside the 4×4 layout that can receivesub-pixels. These locations are packed to 8bit data and stored to theedge buffer as before.

The 8 bit per pixel edge buffer is an alternative rather than areplacement to the 16 bit per pixel buffer. The antialiasing qualitydrops very little, so the benefit of 50% less memory may outweigh thisdisadvantage.

Rendering of Curves

FIG. 7 a and 7 b show a quadratic and a cubic bezier curve respectively.Both are always symmetrical for a symmetrical control point arrangement.Polygon drawing of such curves is effected by splitting the curve intoshort line segments (tessellation). The curve data is sent as vectorgraphics commands to the graphics engine. Tessellation in the graphicsengine, rather than in the CPU reduces the amount of data sent to thedisplay module per polygon. A quadratic bezier curve as shown in FIG. 7a has three control points. It can be defined as Moveto (x1,y1),CurveQto (x2, y2, x3, y3).

A cubic bezier curve always passes through the end points and is tangentto the line between the last two and first two control points. A cubiccurve can be defined as Moveto (x1, y1), CurveCto (x2, y2, x3, y3, x4,y4).

The following code shows two functions. Each function is called N timesduring the tessellation process, where N is the number of line segmentsproduces. Function Bezier3 is used for quadratic curves and Bezier4 forcubic curves. Input values p1-p4 are control points and mu is a valueincreasing from 0 to 1 during the tessellation process. Value 0 in mureturns p1, and value 1 in mu returns the last control point. XYBezier3(XY p1,XY p2,XY p3,double mu) {   double mum1, mum12, mu2;   XYp;   mu2 = mu * mu;   mum1 = 1 − mu;   mum12 = mum1 * mum1;   p.x =p1.x * mum12 + 2 * p2.x * mum1 * mu + p3.x * mu2;   p.y = p1.y * mum12 +2 * p2.y * mum1 * mu + p3.y * mu2;   return(p); } XY Bezier4(XY p1,XYp2,XY p3,XY p4,double mu) {   double mum1,mum13,mu3;   XY p;   mum1 = 1− mu;   mum13 = mum1 * mum1 * mum1;   mu3 = mu * mu * mu;   p.x =mum13*p1.x + 3*mu*mum1*mum1*p2.x + 3*mu*mu*mum1*p3.x + mu3*p4.x;   p.y =mum13*p1.y + 3*mu*mum1*mum1*p2.y + 3*mu*mu*mum1*p3.y + mu3*p4.y;  return(p); }

The following code is an example of how to tessellate a quadratic beziercurve defined by three control points (sx, sy), (x0, y0) and (x1, y1).The tessellation counter x starts from one, because if it were zero thefunction would return the first control point, resulting in a line ofzero length. XY p1,p2,p3; p1.x = sx; p1.y = sy; p2.x = x0; p2.y = y0;p3.x = x1; p3.y = y1;   #define split 8   for(x=1;x<=split;x++)   {    p=Bezier3(p1,p2,p3, x/split);   // Calculate next point on curvepath     LineTo(p.x,p.y);      // Send LineTo   command to Edge Drawunit   }

FIG. 8 shows the curve tessellation process defined in the above codesections and returns N line segments. The central loop repeats for eachline segment.

Fill Types

The colour of the polygon defined in the high-level language may besolid; that is, one constant RGBA (red, green, blue, alpha) value forthe whole polygon or may have a radial or linear gradient.

A gradient can have up to eight control points. Colours are interpolatedbetween the control points to create the colour ramp. Each control pointis defined by a ratio and an RGBA colour. The ratio determines theposition of the control point in the gradient, the RGBA value determinesits colour.

Whatever the fill type, the colour of each pixel is calculated duringthe blending process when the filled polygon is combined into the backbuffer. The radial and linear gradient types merely require more complexprocessing to incorporate the position of each individual pixel alongthe colour ramp.

FIG. 9 gives four examples of linear and radial gradients. All these canbe freely used with the graphics engine of the invention.

FIG. 10 shows a standard gradient square. All gradients are defined in astandard space called the gradient square. The gradient square iscentered at (0,0), and extends from (−16384, −16384) to (16384, 16384).

In FIG. 10 a linear gradient is mapped onto a circle 4096 units indiameter, and centered at (2048, 2048). The 2×3 Matrix required for thismapping is: $\quad\begin{matrix}0.125 & 0.000 \\0.000 & 0.125 \\2048.000 & 2048.000\end{matrix}$

That is, the gradient is scaled to one-eight of its original size(32768/4096=8), and translated to (2048, 2048).

FIG. 11 shows a hairline 23 to be drawn in the edge buffer. A hairlineis a straight line that has a width of one pixel. The graphics enginesupports rendering of hairlines in a special mode. When the hairlinemode is on, the edge draw unit does not apply the four special rulesdescribed for normal edge drawing. Also, the content of the edge bufferis handled differently. The hairlines are drawn to the edge buffer whiledoing the fill operation on the fly. That is, there is no separate filloperation. So, once all the hair lines are drawn for the current drawingprimitive (polygon silhouette for example), each pixel in the edgebuffer contains filled sub-pixels ready for the scanline filler tocalculate the set sub pixels for coverage information and do the normalcolour operations for the pixel (blending to the back buffer). The linestepping algorithm used here is a standard and well known Bresenham linealgorithm with the stepping on sub pixel level.

For each step a 4×4 pixel image 24 of a solid circle is drawn (with anOR operation) to the edge buffer. This is the darker shape shown in FIG.11. As the offset of this 4×4 sub pixel shape does not always alignexactly with the 4×4 sub pixels in the edge buffer, it may be necessaryto use up to four read-modify-write cycles to the edge buffer where thedata is bit shifted in X and Y direction to correct position.

The logic implementing the Bresenham algorithm is very simple, and maybe provided as a separate block inside the edge draw unit. It will beidle in the normal polygon rendering operation.

FIG. 12 shows the original circle shape, and its shifted position. Theleft-hand image shows the 4×4 sub pixel shape used to “paint” the linein to the edge buffer. On the right is an example of the shifted bitmapof three steps right and two steps down. Four memory accesses arenecessary to draw the full shape in to the memory.

The same concept could be used to draw lines with width of more than onepixel but efficiency would drop dramatically as the overlapping areas ofthe shapes with earlier drawn shapes would be bigger.

FIG. 13 shows the final content of the edge buffer, with the sub-pixelhairline 25 which has been drawn and filled simultaneously as explainedabove. The next steps are amalgamation and combination into the backbuffer.

The following is a generic example of the Bresenham line algorithmimplemented in Pascal language. The code starting with the comment“(Draw the Pixels)” is run each clock cycle, and the remaining code onceper line of sub-pixels. procedure Line(x1, y1, x2, y2 : integer; color :byte); var i, deltax, deltay, numpixels,   d, dinc1, dinc2,   x, xinc1,xinc2,   y, yinc1, yinc2 : integer; begin  { Calculate deltax and deltayfor initialisation }  deltax := abs(x2 − x1);  deltay := abs(y2 − y1); { Initialize all vars based on which is the independent variable }  ifdeltax >= deltay then   begin    { x is independent variable }   numpixels := deltax + 1;    d := (2 * deltay) − deltax;    dinc1 :=deltay Shl 1;    dinc2 := (deltay − deltax) shl 1;    xinc1 := 1;   xinc2 := 1;    yinc1 := 0;    yinc2 := 1;   end  else   begin     { yis independent variable }     numpixels := deltay + 1;     d := (2 *deltax) − deltay;     dinc1 := deltax Shl 1;     dinc2 := (deltax −deltay) shl 1;     xinc1 := 0;     xinc2 := 1;     yinc1 := 1;     yinc2:= 1;    end;   { Make sure x and y move in the right directions }   ifx1 > x2 then    begin     xinc1 := − xinc1;     xinc2 := − xinc2;   end;   if y1 > y2 then    begin     yinc1 := − yinc1;     yinc2 := −yinc2;    end;   { Start drawing at }   x := x1;   y := y1;   { Draw thepixels }   for i := 1 to numpixels do    begin     PutPixel(x, y,color);     if d < 0 then      begin       d := d + dinc1;       x :=x + xinc1;       y := y + yinc1;      end     else      begin       d :=d + dinc2;       x := x + xinc2;       y := y + yinc2;      end;    end;end;

Back Buffer Size

The back buffer in which all the polygons are stored before transfer tothe display module is ideally the same size as the front buffer (and hasdisplay module resolution, that is, one pixel of the back buffer at anytime always corresponds to one pixel of the display). But in someconfigurations it is not possible to have a full size back buffer forsize/cost reasons.

The size of the back buffer can be chosen prior to the hardwareimplementation. It is always the same size or smaller than the frontbuffer. If it is smaller, it normally corresponds to the entire displaywidth, but a section of the display height, as shown in FIG. 14. In thiscase, the edge buffer 13 need not be of the same size as the frontbuffer. It is required, in any case, to have one sub-pixel grid perpixel of the back buffer.

If the back buffer 15 is smaller than the front buffer 17 as in FIG. 14,the rendering operation is done in multiple external passes. This meansthat the software running on host CPU must re-send at least some of thedata to the graphics engine, increasing the total amount of data beingtransferred for the same resulting image.

The FIG. 14 example shows a back buffer 15 that is ⅓of the front buffer17 in the vertical direction. In the example, only one triangle isrendered. The triangle is rendered in three passes, filling the frontbuffer in three steps. It is important that everything in the part ofthe image in the back buffer is rendered completely before the backbuffer is copied to the front buffer. So, regardless of the complexityof the final image (number of polygons), in this example configurationthere would always be maximum of three image transfers from the backbuffer to the front buffer.

The full database in the host application containing all the moveto,lineto, curveto commands does not have to be sent three times to thegraphics engine. Only commands which are within the current region ofthe image, or commands that cross the top or bottom edge of the currentregion are needed. Thus, in the FIG. 14 example, there is no need tosend the lineto command which defines bottom left edge of the trianglefor the top region, because it does not touch the first (top) region. Inthe second region all three lineto commands must be sent as all linestouch the region. And in the third region, the line to on top left ofthe triangle does not have to be transferred.

Clearly, the end result would be correct without this selection of codeto be sent but selection reduces the bandwidth requirement between theCPU and the graphics engine. For example, in an application that rendersa lot of text on the screen, a quick check of the bounding box of eachtext string to be rendered will result in fast rejection of manyrendering commands.

Sprites

Now that the concept of the smaller size back buffer and its transfer tothe front buffer has been illustrated, it is easy to understand how asimilar process can be used for rendering of 2D or 3D graphics orsprites. A sprite is a usually moving image, such as a character in agame or an icon. The sprite is a complete entity that is transferred tothe front buffer at a defined location. Thus, where the back buffer issmaller than the front buffer, the back buffer content in each pass canbe considered as one 2D sprite.

The content of the sprite can be either rendered with polygons, or bysimply transferring a bitmap from the CPU. By having configurable width,height and XY offset to indicate which part of the back buffer istransferred to which XY location in the front buffer, 2D sprites can betransferred to the front buffer.

The FIG. 14 example is in fact rendering three sprites to the frontbuffer where the size of the sprite is full back buffer, and offset ofthe destination is moved from top to bottom to cover the full frontbuffer. Also the content of the sprite (back buffer) is rendered betweenthe image transfers.

FIG. 15 shows one sprite in the back buffer copied to two locations inthe front buffer. Since the width, height and XY offset of the spritecan be configured, it is also possible to store multiple differentsprites in the back buffer, and draw them to any location in frontbuffer in any order, and also multiple times without the need to uploadthe sprite bitmap from the host to the graphics engine. One practicalexample of such operation would be to store small bitmaps of eachcharacter of a font set in the back buffer. It would then be possible todraw bitmapped text/fonts in to the front buffer by issuing imagetransfer commands from CPU, where the XY offset of the source (backbuffer) is defined for each letter.

FIG. 16 shows an example in which hundreds of small 2D sprites arerendered to simulate spray of small particles.

Hardware Implementation of the Graphics Engine

A hardware implementation has been implemented as shown in FIG. 17. Thefigure shows more detailed block diagram of the internal units of theimplementation.

The edge drawing circuitry is formed by the edge draw units shown inFIG. 17, together with the edge buffer memory controller.

The filler circuitry is shown as the scanline filler, with the virtualpen and amalgamation logic (for amalgamation of the sub-pixels intocorresponding pixels) in the mask generator unit. The back buffer memorycontroller combines the amalgamated pixel into the back buffer.

A ‘clipper’ mechanism is used for removing non visible lines in thishardware implementation. Its purpose is to clip polygon edges so thattheir end points are always within the screen area while maintaining theslope and position of the line. This is basically a performanceoptimisation block and its function is implemented as the following fourif clauses in the edgedraw function:

-   -   if (iy<0) continue;    -   if (iy>220*4) continue;    -   if (ix<0) ix=0;    -   if (ix>(176*4)) ix=176*4;

If both end points are outside the display screen area to the same side,the edge is not processed; otherwise, for any end points outside thescreen area, the clipper calculates where the edge crosses onto thescreen and processes the “visible” part of the edge from the crossingpoint only.

In hardware it makes more sense to clip the end points as describedabove rather than reject individual sub-pixels, because if the edge isvery long and goes far outside of the screen, the hardware would spendmany clock cycles not producing usable sub-pixels. These clock cyclesare better spent in clipping.

The fill traverse unit reads data from the edge buffer and sends theincoming data to the mask generator. The fill traverse need not stepacross the entire sub-pixel grid. For example it may simply process allthe pixels belonging to a rectangle (bounding box) enclosing thecomplete polygon. The guarantees that the mask generator receives allthe sub-pixels of the polygon. In some cases this bounding box may befar from the optimal traverse pattern. Ideally the fill traverse unitshould omit sub-pixels that are outside of the polygon. There are numberof ways to add intelligence to the fill traverse unit to avoid suchreading empty sub-pixels from the edge buffer one example of such anoptimisation is to store the left-most and right-most sub-pixel sent tothe edge buffer for each scanline (or horizontal line of sub-pixels) andthen traverse only between these left and right extremes.

The mask generator unit simply contains the “virtual pen” for the filloperation of incoming edge buffer sub-pixels and logic to calculate theresulting coverage. This data is then sent to back buffer memorycontroller for combinating to the back buffer (colour blending).

The following table shows approximate gate counts of various unitsinside the graphics engine and comments relating to the earlierdescription where appropriate. Gate Unit Name count Comment Input fifo3000 Preferably implemented as RAM Tesselator 5000-8000 Curve tesselatoras described above Control 1400 Ysort & Slope 6500 As start of edge drawcode divide section above Fifo 3300 Makes Sort and Clipper work inparallel. Clipper 8000 Removes edges that are outside the screen Edgetraverse 1300 Steps across the sub-pixel grid to set appropriatesub-pixels. Fill traverse 2200 Bounding box traverse. More gatesrequired when optimised to skip non covered areas. Mask generator 1100More gates required when linear and radial gradient logic added Edgebuffer 2800 Includes last data cache memory controller Back buffer 4200Includes alpha blending memory controller TOTAL ˜40000

Integration of the Graphics Engine into the Display Module

FIG. 18 is a schematic representation of a display module 5 including agraphics engine 1 according to an embodiment of the invention,integrated in a source IC 3 for an LCD or equivalent type display 8. TheCPU 2 is shown distanced from the display module 5. There are particularadvantages for the integration of the engine directly with the sourcedriver IC. Notably, the interconnection is within the same siliconstructure, making the connection much more power efficient than separatepackaging. Furthermore, no special I/O buffers and control circuitry isrequired. Separate manufacture and testing is not required and there isminimal increase in weight and size.

The diagram shows a typical arrangement in which the, source IC of theLCD display also acts as a control IC for the gate IC 4.

FIG. 19 is a schematic representation of a display module 5 including agraphics engine 1 according to an embodiment of the invention,integrated in the display module and serving two source ICs 3 for an LCDor equivalent type display. The graphics engine can be provided on agraphics engine IC to be mounted on the reverse of the display moduleadjacent to the display control IC. If takes up minimal extra spacewithin the device housing and is part of the display module package.

In this example, the source IC 3 again act as controller for a gate IC4. The CPU commands are fed into the graphics engine and divided in theengine into signals for each source IC.

FIG. 20 is a schematic representation of a display module 5 with anembedded source driver IC incorporating a graphics engine and its linksto CPU, the display area and a gate driver IC. The figure shows in moredetail the communication between these parts. The source IC, which isboth the driver and controller IC, has a control circuit for control ofthe gate driver, LCD driver circuit, interface circuit and graphicsaccelerator. A direct link between the interface circuit and sourcedriver (bypassing the graphics engine) allows the display to workwithout the graphics engine.

FIG. 21 shows component blocks in the display driver IC.

The power supply circuitry is not shown. It may be integrated, or as aseparate device. The power supply circuit depends on the type of thedisplay used.

Furthermore, the gate (Y/row direction) driver circuitry is not shown inany detail, because a similar situation applies as for the powercircuitry, and the type of gate driver is not relevant to the invention.

It should be noted that the combination of display control IC (sourcedriver) and graphics engine does not necessarily exclude any of thefunctionality of the existing display control IC.

Interface Circuit with FIFO

The type of the interface used may depend on end-customer demand (forexample 8 bit parallel, 16 bit parallel, various control signals). Theinterface 10 has the ability to control data flow in both directions.Data flow is primarily from CPU, however, the possibility exists to readback data from the display memory (front buffer). Direct read/write maybe used for low-level instructions or low level CPU interactions (BIOSlevel or similar).

The FIFO interface may be compatible/compliant with, for example, anIntel or Motorola standard peripheral interface bus or any custom typebus.

Control signals serve to perform handshaking for data transfer in eitherdirection. For example, data transfer can be writing to a controlregister (control logic) to instruct the operation of the circuitry orreading a control/status register to verify the status of the circuitryor status of operation performing (finished or not finished).

Generally there are two modes of operation of the interface circuitrelated to data flow:

-   -   a) Basic mode, which writes to display memory directly (via data        interface logic) bypassing graphics acceleration, or    -   b) Accelerated mode, which sends high level commands to the        graphics accelerator to interpret them.

The basic mode (writing directly into display memory) may be used in thefollowing cases:

During power-on, a low level initialization routine (executed by hostCPU) may purge or initialize display memory in order to display lowlevel (BIOS type) messages or to display logo or other graphic.

Despite the presence of graphics acceleration the host CPU may directlyaccess display memory to use the circuitry in legacy compatible mode (asin the prior art). This mode can be used for compatibility reasons ifnecessary.

Host CPU may read-out the contents of the display memory in case itrequires the information in order to perform a transformation on theimage currently displayed.

The basic mode use in the above cases is based on bitmap image dataformat. The second accelerated mode (b)) in which data in the form ofhigh level commands, is sent to the graphics accelerator (via thecommand buffer/FIFO) is the mode which brings the key benefits describedherein.

The curve tesselator 11, edge draw 12, edge buffer memory 13, scan-linefiller 14 and back buffer blocks have previously been described indetail in relation to FIGS. 1 to 16.

Control Logic & Power Management

This central unit 7 controls overall operation of the circuitry. It isconnected with the interface circuit and LCD timing control logic andcontrols all units of graphics acceleration, data exchange with host CPUand access to display memory.

A set of control/status registers is used to control the operation ofthe circuit. Host CPU writes values to control registers (via theinterface circuit) to assign mode of operation and instruct circuitrywhat to do with consequent data coming from host CPU. Accordingly a setof status registers is used to represent current status andprogress/completion of previously issued instructions.

This unit also generates control and timing signals for all blocks ofthe graphics accelerator, data interface logic and for LCD timingcontrol logic block. These signals control all activities in thegraphics accelerator part and steer data transfer between individualblocks up to data interface logic.

Further, this block controls the operation properties of the LCD timingcontrol logic block, which controls all timing related to imagerefreshing on the display. display refresh timing and the timing signalsrequired for the operation of the graphics accelerator may be, but arenormally not synchronized. Data interface logic has thereforearbitration logic to enable smooth transfer of data between the twoclock domain areas.

Power Management Function

Generally two modes help to save power during operation and in stand-bymode: a) Dynamic clock gating during operations on data and b) Staticmode during stand by mode.

Dynamic power management mode (a) controls all timing/clock signals toeach individual block in a way to distribute/enable clock into onlythose blocks which are required to perform an operation on data. Clocksignals for all other blocks are stopped (held high or low). Thisprevents unnecessary clocking of the circuitry in idle stage and thussaves power. The technique is called clock gating. Detection of activityis within the Control Logic and Power management unit and does notnecessarily require CPU interaction.

Static power saving mode (b) is primarily used during stand-by time(most of the time for mobile devices) and thus extends stand-by time.This is implemented by locating all units/blocks of the circuitry, whichare not used during stand-by time (for example all around the graphicsaccelerator circuit), in an isolated area with separate power supplypins. This area may still reside on the same silicon die, however, it ispossible to switch it off by removing power supply for the isolatedsection. This is normally achieved using indirect host CPU interaction,as the CPU knows the state/mode of the mobile device.

Data Interface Logic

The data interface logic block 16 selects the data to be written intodisplay memory or read out of it. One path (bypassing the graphicsaccelerator) feeds host CPU data into the display memory or the otherway around, in case CPU needs to read some or all of the image back intoCPU memory. The other path transfers calculated image data from thegraphics accelerator into display memory.

This block is also used to perform arbitrage between circuitry of twodifferent clock domains. The LCD driver portion performs transactionsand operations under clock. (or multiple of it) which enablesappropriate display update/refresh rate (example 60 Hz). On the otherside, graphics accelerator operation and interfacing with host CPU runswith a clock which allows sufficient acceleration performance and smoothinterfacing with host CPU. Arbitrage enables smooth and (for thedisplay) flicker-free transfer of image data to/from display memory,regardless of data origin (from CPU or from graphics accelerator).

Display Memory

This portion of memory 17 is also called the frame or front buffer. Itholds image data for display. Either host CPU or data from the graphicsaccelerator updates the contents of this memory. LCD timing controllogic allows the contents to be regularly refreshed and sent to thedisplay. In case of any animated contents, new image data will bewritten into display memory, and during the next refresh period (LCDtiming control logic) this image will appear on the display. In case ofa static image or for case of stand-by operation (also static image) thecontents of the display memory will not be changed. It will only beregularly read-out due to refreshing of the display.

This means that in stand-by mode or for a still image, all blocks beforedisplay memory may be switched to idle. Only the polling/monitoringfunctionality (in control logic & power management) has to run in orderto trigger operation resume when host CPU sends a new command.

The memory size is normally X*Y*CD (X dimension of display in pixels, Ydimension, CD is colour depth/16 bit for 65 k colours).

Decoder & Display Latch

The decoder and display latch 18 converts bit image data stored in thedisplay memory into column format. Each column for a pixel basicallyconsists of three (sub) columns (RGB). Additionally, digital imageinformation from the display memory is converted into analog signals.

As display driver signals (source outputs) are analogue signals withamplitude and levels different of those used in logic circuitry, levelshifting is performed in this block.

Finally, data latch registers to hold the information for the timerequired to refresh one line (basically 1 pixel if we are talking interms of 1 column). In the meantime, LCD timing & control logic,prepares the next set of data from the display memory to be latched anddisplayed (next line).

LCD Driver Circuitry

The LCD driver circuitry 19 prepares electrical signals to be applied tothe display. This is an analogue type of circuitry and its actualconstruction heavily depends on the display type.

LCD Timing Control Logic

The LCD timing control logic unit 20 generates all timing and controlsignals for image refreshing on the display. It generates appropriateaddressing and control signals to regularly update the display imagewith the content stored in the display memory. It initializes read outdata from display memory (one line at a time), and passes it through thedecoder & display data latch to be decoded and later passed through LCDdriver circuitry. The clock timing and frequency of this block enablesappropriate refresh rate of the display (e.g. 60 Hz). This blocknormally has its own oscillator and it is not synchronised with the restof the circuitry around the Graphics Accelerator.

Gate Driver Control

The driver control block 21 represents the interface with the gatedriver IC. It supplies signals to the gate driver IC to enableappropriate display refreshing. The exact details of this block dependon the type of display used.

The main function of this part is to sequentially scan all lines (rows)to generate the image in combination with information given by sourcedriver. In the case of amorphous TFT type displays the voltage level todrive gate (row) stripes may be in the range of +/−15V. This requiresthe gate driver IC to be realized in a different process/technology. Notall display types require such a voltage range and where there is nosuch requirement an integrated version of the gate driver and sourcedriver can be -realized on one silicon die (IC).

The main part of the gate driver is a shift register which shifts/movesa pulse from the beginning to the end of the display (from the topstripe down to the bottom stripe) in sequence. Some additionalfunctionality like pulse gating and shaping are also included in thispart to obtain appropriate timing (to avoid overlaps, etc . . . ). Allthe timing and pulse information comes from the display driver IC and isfully synchronized with it.

TPT Operation

Displays suitable for use with the invention may have a TFT (thin filmtransistor) structure. A TFT display has a matrix (X-Y) addressabledisplay field with X (gate/row) and Y (source/columns) conductivestripes. Voltage differences between the X and Y stripes control thedegree of transmissibility of back-light. In colour displays there are 3vertical (Y) stripes for each pixel to control RGB composition. FIG. 22shows a TFT type structure and addressing as well as a typical timingdiagram for the gate driver IC.

The display shown in FIG. 22 operates in a way to address one line(gate/row) at a time, proceeding to the next line and sequentially tothe end (normally the bottom) of the display, and then resuming from thetop. The speed of refreshing is called the refresh rate and may be inthe range of 60 Hz (refreshes/second).

Source Driver Circuitry

FIG. 23 shows source driving for an LCD display, in which colourinformation from the front buffer is sent to the display. The pixelinformation for the entire row/line is read from display memory andapplied to DAC converters, such as the decoder shown at 18 in FIG. 21.The MUX transmission gate selector in FIG. 23 functions as a DAC. Thenumber of DAC converters required is three times the display pixelresolution (RGB). In this case the DAC converter also functions as ananalogue Multiplex/Selector. The digital value applied to DAC selectsone of the levels generated by a gray scale generator. For example,selecting “low intensity” gives a dark image, and consequently “highintensity” gives a bright image. Colour is composed on the display insimilar manner as in a CRT tube. This procedure is repeated for eachscan line.

The MUX transmission gate selector can also serve as a level shifter,since the voltages for the logic portion are normally lower than thevoltage required to drive the Source line of the display. The voltagerange for the Source Drive is in the range of 0V-5V. The Gray ScaleGenerator and MUX/Selector work with weak signals (determiningintensity) and finally signals selected by the MUX/Selector areamplified (AMP) appropriately in order to drive the source stripe.

Although FIGS. 19 to 23 are specific to an LCD display, the invention isin no way limited to a single display type. Many suitable display typesare known to the skilled person. These all have X-Y (column/row)addressing and differ from the specific LCD implementation shown abovemerely in driver implementation and terminology. Of course the inventionis applicable to all LCD display types such as STN, amorphous TFT, LTPS(low temperature polysilicon) and LCOS displays. It is furthermoreuseful for LED base displays, such as OLED (organic LED) displays.

For example, one particular application of the invention would be in anaccessory for mobile devices in the form of a remote display worn orheld by the user. The display may be linked to the device by Bluetoothor a similar wireless protocol.

In many cases the mobile device itself is so small that it is notpracticable (or desirable) to add a high resolution screen. In suchsituations, a separate near to eye (NTE) or other display, possibly on auser headset or user spectacles can be particularly advantageous.

The display could be of the LCoS type, which is suitable for wearabledisplays in NTE applications. NTE applications use a single LCOS displaywith a magnifier that is brought near to the eye to produce a magnifiedvirtual image. A web-enabled wireless device with such a display wouldenable the user to view a web page as a large virtual image.

EXAMPLES

Display Variations Where:

-   -   Display describes resolution of the display (X*Y)    -   Pixels is the amount of pixels on the display (=X*Y)    -   16 color bits is the actual amount of data to refresh/draw full        screen (assuming 16 bits to describe properties of each pixel)    -   FrameRate@25 Mb/s describes number of times the display may be        refreshed per second assuming the data transfer rate of 25        Mbit/second

Mb/s@15 fps represents required data transfer speed to assure 15updates/second full screen. Frame 16 color Rate Mb/s Display Pixels bits@25 Mb/s @15 fbs 128 × 128 16384 262144 95.4 3.9 144 × 176 25344 40550461.7 6.1 176 × 208 36608 585728 42.7 8.8 176 × 220 38720 619520 40.4 9.3176 × 240 42240 675840 37.0 10.1 240 × 320 76800 1228800 20.3 18.4 320 ×480 153600 2457600 10.2 36.9 480 × 640 307200 4915200 5.1 73.7

Examples for power consumption for different interfaces. CMADS i/f @ 25Mb/s 0.5 mW → 20 uW/Mb CMOS i/f @25 Mb/s 1 mW → 40 uW/Mb

Hereafter 4 bus traffic examples demonstrating traffic reduction onCPU→Display bus:

(NOTE: these examples demonstrate only BUS traffic but not CPU load).

Case1: Full Screen of Kanji Text (Static)

Representing a complex situation, for the display size 176×240 resultingin 42240 pixels, or 84480 Bytes (16bit/pixel=2Bytes/pixel). Assuming aminimum of 16×16 pixels for a kanji character, this gives 165 kanjicharacters per screen. One Kanji character may in average be describedin about 223 Bytes, resulting in overall amount of 36855 Bytes of data.Byte 84480 Pix 42240 16 <-- X * Y for one Kanji Y-pix 240 15 X-pix 17611 5 165 <--- # kanji Full Screen Display 223 <-- Bytes/Kanji (SVG)Traffic Traffic BitMap SVG 84480 36855

In this particular case the use of SVG accelerator would require 36Kbyte to be transferred and for Bitmap Refresh (=refresh or draw of fullscreen without using accelerator) results in 84 Kbyte data to betransferred. (56% reduction).

Due to SVG basic property (Scalable) 36 Kbytes of data remainsunchanged, regardless of the screen resolution, assuming the same numberof characters. This is not the case in bit-mapped system, where thetraffic grows proportionally with # of pixels (X*Y).

Case2: Animated (@15 fps) Busy Screen (165 Kanji Characters) (Display176×240) 84480 36855 fps 15 1267200 552825 bits uW 40 50.7 22.1 uW forBus

40 represents 40 μw/mbit of data. FIG. 25 shows data transfer andcorresponding power usage between the CPU and graphics engine andgraphics engine and display.

Case3: Filled Triangle over Full Screen

Full Screen

-   -   Bit˜Map (=without accelerator) 84480 Byte data (screen 176×240,        16 bit colour),    -   for SVG accelerator only 16 Bytes (99.98% reduction).

Case4: Animated (@15 fps) Rotating Filled Triangle (Display 176×240)84480 16 fps 15 1267200 240 bits uW 40 50.7 0.01 uW for Bus

40 represents 40 μw/mbit of data. FIG. 26 shows data transfer andcorresponding power usage between the CPU and graphics engine andgraphics engine and display.

This last example shows the suitability of the graphics engine for usein games such as for animated Flash(™^(Macromedia)) Macromedia) basedGames.

1. A display driver integrated circuit, for connection to a small-areadisplay, the integrated circuit including a hardware-implementedgraphics engine for receiving vector graphics commands and renderingimage data for display pixels in dependence upon the received commands,and also including display driver circuitry for driving the connecteddisplay in accordance with the image data rendered by the graphicsengine.
 2. A display module for incorporation in a portable electricaldevice and including: a display; a hardware-implemented graphics enginefor receiving vector graphics commands and rendering image data fordisplay pixels in dependence upon the received commands; and displaydriver circuitry connected to the graphics engine and to the display fordriving the display in accordance with the image data rendered by thegraphics engine.
 3. A display driver or module according to claim 1wherein the graphics engine includes control circuitry to read in onevector graphics command at a time, convert the command to spatial imageinformation and then discard the original command.
 4. A display driveror module according to any of the preceding claim 1 wherein the graphicsengine includes edge-drawing circuitry linked to an edge buffer to storesequentially the edges of any polygon read into the engine.
 5. A displaydriver or module according to claim 4 wherein the edge buffer isarranged to store sub-pixels, a plurality of sub-pixels corresponding toeach display pixel.
 6. A display driver or module according to claim 5wherein each sub-pixel is switchable between set and unset states andwherein the edge buffer stores each polygon edge as boundary sub-pixelswhich are set and whose positions in the edge buffer correspond to theedge position in the final image.
 7. A display driver or moduleaccording to claim 4 wherein the graphics engine includes fillercircuitry to fill in polygons whose edges have been stored in the edgebuffer.
 8. A display driver or module according to any of the precedingclaim 1 wherein the graphics engine includes a back buffer to store partor all of an image before transfer to a front buffer of the displaymemory.
 9. A display driver or module according to claim 8 wherein eachpixel of the back buffer is mapped to a pixel in the front buffer andthe back buffer preferably has the same number of bits per pixel as thefront buffer to represent the color (RGBA value) of each display pixel.10. A display driver or module according to claim 8 wherein the graphicsengine includes combination circuitry to combine sequentially eachfilled polygon from the filler circuitry into the back buffer.
 11. Adisplay driver or module according to claim 1 wherein the color of eachpixel stored in the back buffer is determined in dependence on the colorof the pixel in the polygon being processed, the percentage of the pixelcovered by the polygon and the color already present in thecorresponding pixel in the back buffer.
 12. A display driver or moduleaccording to claim 3 wherein the edge buffer comprises sub-pixels in theform of a grid having a square number of sub-pixels for each displaypixel.
 13. A display driver or module according to claim 12 whereinevery other sub-pixel in the edge buffer is not utilized, so that halfthe square number of sub-pixels is provided for each display pixel. 14.A display driver or module according to claim 12 wherein the slope ofeach polygon edge is calculated from the edge end points and thensub-pixels of the grid are set along the line.
 15. A display driver ormodule according to claim 13 wherein the following rules are used forsetting sub-pixels: one sub-pixel only per horizontal line of thesub-pixel grid is set for each polygon edge; the sub-pixels are set fromtop to bottom (in the Y direction); the last sub-pixel of the line isnot set; any sub-pixels set under the line are inverted.
 16. A displaydriver or module according to claim 12 wherein the filler circuitryincludes logic acting as a virtual pen traversing the sub-pixel grid,which pen is initially off and toggles between the off and on stateseach time it encounters a set sub-pixel.
 17. A display driver or moduleaccording to claim 16 wherein the virtual pen sets all sub-pixels insidethe boundary sub-pixels, and includes boundary pixels for right-handboundaries, and clears boundary pixels for left-hand boundaries or viceversa.
 18. A display driver or module according to claim 10 wherein thesub-pixels from the filler circuitry corresponding to a display pixelare amalgamated into a single pixel before combination to the backbuffer.
 19. A display driver or module according to claim 12 wherein thenumber of sub-pixels of each amalgamated pixel covered by the filledpolygon determines a blending factor for combination of the amalgamatedpixel into the back buffer.
 20. A display driver or module according toclaim 8 wherein the back buffer is copied to the front buffer of thedisplay memory once the image on the part of the display for which itholds information has been entirely rendered.
 21. A display driver ormodule according to claim 8 wherein the back buffer is of the same sizeas the front buffer and holds information for the whole display.
 22. Adisplay driver or module according to claim 8 wherein the back buffer issmaller than the front buffer and stores the information for part of thedisplay only, the image in the front buffer being built from the backbuffer in a series of external passes.
 23. A display driver or moduleaccording to claim 22 wherein only commands relevant to the part of theimage to be held in the back buffer are sent to the graphics engine ineach external pass.
 24. A display driver or module according to claim 4wherein the graphics engine further includes a curve tessellator todivide any curved polygon edges into straight-line segments and storethe resultant segments in the edge buffer.
 25. A display driver ormodule according to claim 8 wherein the graphics engine is adapted sothat the back buffer can hold one or more predetermined image elements,which are transferred to the front buffer at one or more locationsdetermined by the high level language.
 26. A display driver or moduleaccording to claim 4 wherein the graphics engine is operable in hairlinemode, in which mode hairlines are stored in the edge buffer by settingsub-pixels in a bitmap and storing the bitmap in multiple locations inthe edge buffer to form a line.
 27. A display driver or module accordingto any of the preceding claims claim 1 wherein the graphics engine isless than 100K gates in size and preferably less than 50K.
 28. Thedisplay driver or module according to claim 1 wherein the display drivercircuitry is for one direction of the display only.
 29. The displaydriver or module according to any claim 1 wherein the display drivercircuitry also includes control circuitry for control of the display.30. The display driver or module according to claim 29 wherein thedisplay control circuitry also includes driver control circuitry forconnection to a separate display driver for the other direction.
 31. Thedisplay module according to claim 2 wherein the graphics engine rendersimage data for a plurality of display driver integrated circuits.
 32. Adisplay driver or module according to claim 1 the display driver furtherincluding display memory, decoder and display latch and timing, datainterface logic, control logic and power management circuitry.
 33. Anelectrical device including: a processing unit; and a display unithaving a display, wherein the processing unit sends high-level graphicscommands to the display unit and a hardware-implemented graphics engineis provided in the display unit to render image data for display pixelsin accordance with the high-level commands.
 34. An electrical deviceaccording to claim 33 further incorporating a portable electrical deviceand including: a display; a hardware-implemented graphics engine forreceiving vector graphics commands and rendering image data for displaypixels in dependence upon the received commands; and display drivercircuitry connected to the graphics engine and to the display fordriving the display in accordance with the image data rendered by thegraphics engine.
 35. (canceled)